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INTRODUCTION 


PROBLEM 

There  are  currently  several  large  and  long  running  programs  being  processed 
on  the  AFWL  scientific  computers  (CDC  6600,  CYBER  176).  They  are  large  In  terms 
of  the  main  memory,  extended  main  memory  (ECS/LCM),  or  disk  storage  requirements, 
and  Iona  running  In  terms  of  central  processor  or  total  real-time  required  to 
execute.  These  programs  must  compete  for  computing  resources  along  with  all  other 
jobs  which  run  on  the  scientific  machines.  The  other  jobs  Include  small  Inter¬ 
active  programs,  debug  jobs,  one-time  special  purpose  jobs,  accounting  jobs,  and 
others  which  Individually  do  not  require  a  significant  percentage  of  the  computer 
resources,  but  together  comprise  a  substantial  portion  of  the  total  workload. 

The  result  of  this  competition  is  that  the  large  jobs  are  run  only  during  nonpeak 
workload  hours  or  the  other  work  receives  less  than  satisfactory  response  and 
turnaround.  Usually  a  combination  of  these  two  results  occurs  wherein  special 
times  are  scheduled  to  run  the  large  codes  or  they  are  run  along  with  the  other 
jobs.  This  results  in  degraded  response  for  all  jobs  and  Inefficient  machine  use. 

In  addition  to  the  programs  which  are  currently  being  run,  some  can  be 
Identified  for  which  requirements  exist  but  which  are  not  being  run  due  to  lack 
of  sufficient  computer  resources.  In  some  cases  these  are  the  3-dtmensional 
(3-D)  versions  of  currently  running  2-D  codes,  in  all  cases  they  are  not  being 
run  due  to  lack  of  sufficient  memory  or  CP  power. 

PROPOSED  SOLUTION 

Many  significant  advances  In  minicomputer  technology  have  been  achieved 
during  the  past  several  years.  These  include  the  development  of  fast  memories; 
high  bandwidth  memory  and  I/O  busses;  cache  memories  to  provide  faster  effective 
memory  access;  powerful  floating  point  processors.  Including  several  general 
purpose  array  processors;  and  large,  fast  disk  storage  devices.  It  Is  possible 
that  a  suitably  equipped  minicomputer  system  could  be  configured  which  would 
support  the  processing  and  I/O  requirements  of  many  of  the  large  codes  which  are 
a  problem  In  the  current  large  machine  environment. 

The  minicomputer  approach  has  several  advantages.  Run  in  a  dedicated 
(non-mu Iti -tasking)  manner,  the  minicomputer  could  be  operated  under  the  direct 
control  of  one  group  or  project  for  an  assigned  period  of  time.  During  this  time, 
large  blocks  could  be  reserved  for  one  or  more  problems.  Overhead  due  to 
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scheduling  and  job  swapping  would  be  eliminated.  A  code  which  receives  one  day 
turnaround  for  two  CP  hours  of  6600  processing  could  conceivably  be  turned 
around  In  2  or  3  hours  on  the  minicomputer.  A  problem  which  runs  for  40  hours 
of  CP  time  could  be  completed  In  days  rather  than  weeks.  In  general,  although 
the  raw  computing  power  of  the  minicomputer  might  be  less  than  the  6600,  turn¬ 
around  times  for  some  codes  could  be  dramatically  reduced. 

SCOPE 

The  problem  of  determining  the  suitability  and  economic  advantages  of  using 
minicomputers  to  support  large  computer  codes  involves  identifying  the  main  areas 
of  concern  and  performing  systematic  investigations  of  those  areas.  In  addition, 
some  benchmark  codes  must  be  developed  to  measure  the  performance  of  selected 
minicomputers  in  comparison  with  the  performance  on  the  large  machines.  If  it  Is 
determined  that  the  minicomputer  approach  is  justified,  then  code  conversion 
and  development  efforts  must  be  undertaken. 

A  list  of  the  main  areas  of  concern  which  have  been  identified  follows. 

In  general,  they  are  listed  In  the  order  In  which  they  should  be  addressed. 

However,  In  some  cases  some  overlap  can  occur.  For  example,  the  benchmark 
set  could  be  developed  during  the  selection  and  procurement  process. 

•  Identify  candidate  codes. 

•  Determine  requirements  of  a  minicomputer. 

•  Accomplish  an  industry  survey  of  minicomputer  hardware. 

•  Determine  code  conversion  requirements. 

•  Perform  an  economic  analysis. 

•  Develop  a  benchmark  set. 

•Select  and  procure  a  minicomputer  system. 

•  Perform  the  code  conversion  and  documentation. 

•  Document  results  of  the  effort. 

BACKGROUND 

During  the  preparation  of  this  report,  the  first  five  areas  were  investigated 
in  some  detail.  This  included  interviewing  various  key  people  in  the  AFWL  computer 
user  community  to  identify  candidate  codes.  Some  listings  were  obtained  and 
studied  to  determine  the  amount  of  effort  Involved  in  the  conversion  process,  and 
general  minicomputer  requirements  were  established.  Minicomputer  hardware  which 
could  support  the  effort  was  narrowed  to  three  vendors,  and  the  economic  analysis 
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was  performed  using  those  vendors.  The  five  areas  which  have  been  Investigated 
and  the  four  remaining  task  areas  are  discussed  In  greater  detail  throughout  the 
remainder  of  this  document. 
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CANDIDATE  CODES 


GENERAL 

The  first  step  in  the  effort  to  identify  candidate  codes  was  to  identify 
existing  and  proposed  codes  which  satisfy  the  criteria  given  in  the  Introduction 
section,  and  for  which  there  exist  reasonably  long  term  requt rements .  Reason¬ 
ably  long  term  is  loosely  defined  as  a  period  of  time  which  would  economically 
justify  the  code  conversion  and  documentation  efforts  required,  and  depends 
largely  upon  the  extent  of  the  conversion  effort  and  the  projected  amount  of  use 
of  the  code.  The  following  criteria  served  as  guidelines  in  identifying  pros¬ 
pective  codes.  One  or  more  were  used  to  determine  whether  or  not  a  particular 
code  was  a  candidate  to  be  investigated  further.  The  criteria  are  stated  in 
terms  of  current  or  projected  6600/176  requirements. 

•  Requires  more  than  I  hour  of  6600  CP  time. 

•  Runs  longer  than  I  hour  on  a  dedicated  machine. 

•  Typically  requires  more  than  I  day  to  turn  around. 

•Requires  more  than  100  K  (octal)  words  of  main  memory. 

•Requires  more  than  200  K  (octal)  words  of  extended  main  memory. 

•  Transfers  more  than  ten  million  words  of  1/0. 

•  Requires  more  than  four  million  words  of  disk  storage. 

•  Is  classi fled. 

At  this  time,  five  codes  have  been  identified  as  candidates  for  conversion 
to  the  minicomputer  environment.  These  are  discussed  In  the  following  paragraphs. 
Table  I  summarizes  the  important  characteristics  of  these  codes. 

ALFA 

ALFA  is  a  finite  difference  code  being  run  by  AFWL/ARAO  personnel  in  support 
of  the  laser  R&D  effort.  It  requires  a  small  preprocessor  called  DYNDIM.  Both 
are  entirely  FORTRAN  programs.  ALFA  requires  up  to  a  half  million  words  of  memory 
and  can  run  for  as  long  as  2  hours  on  the  CDC  6600.  Its  use  is  anticipated  for  at 
least  two  more  years. 

APACHE 

This  Is  another  finite  difference  code  with  characteristics  similar  to  ALFA. 
The  primary  physical  differences  are  that  APACHE  requires  up  to  one  million  words 
of  memory  and  could  run  as  long  as  50  CP  hours  on  a  problem  on  the  CDC  6600. 
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TABLE  I.  CANDIDATE  CODES 


Name 

Memory  Rqmts* 

CY 176/CP  Time 

Precision/Range 

ALFA 

165  K/635  K 

5  min  -  2  hrs 

3  dlgits/IO+2 1—20 

APACHE 

256  K/985  K 

2-50  hrs 

3  dlgits/IO420*  -2° 

HULL 

256  K/1959  K** 

30  -  40  hrs 

12  dig! ts/ !0+  3 °»  ~ 3 0 

NASTRAN 

96  K/734  K** 

Minutes 

12  digits/ IO+30'  — 30 

QUANTA 

256  K/985  K 

Minutes 

6  digtts/IO+4>  ~4 

^Memory  requirements  are  listed  in  decimal  numbers  as  total  SSM  +  LCM 
CYI76  60-bit  words/total  minicomputer  8-bit  bytes.  The  minicomputer 
requirement  assumes  that  95  percent  of  the  CYI76  memory  requirements 
consists  of  floating  point  data. 

**Requires  double  precision  floating  point  numbers. 


HULL 

The  HULL  code  is  currently  being  investigated  for  conversion  to  run  on  the 
GRAY  I  at  the  AFWL.  One  version  or  another  of  HULL  has  been  running  at  the  AFWL 
for  several  years,  and  its  use  is  expected  to  continue.  A  typical  HULL  problem 
can  require  up  to  40  CP  hours  of  CYBER  176  time  to  complete  and  can  use  a  million 
words  of  memory.  It  is  expected  that  HULL  will  be  one  of  the  most  difficult  codes 
to  convert  for  this  effort. 

NASTRAN 

NASTRAN  is  a  structural  analysis  code  which  Is  used  rather  heavily  at  AFWL. 
However,  the  current  version  is  supported  by  a  contractor  and  is  proprietary.  It 
is  probably  not  feasible  to  accomplish  the  conversion  using  in-house  resources. 

At  least  two  minicomputer  versions  exist  at  this  time.  A  version  for  the  PRIME  is 
available  through  Schaeffer  analysis  In  Mt  Vernon,  New  Hampshire,  and  a  VAX  version 
is  available  through  MacNeal-Schwendler  In  Los  Angeles. 

QUANTA 

QUANTA  Is  a  war  gaming  simulation  code  concerned  with  weapons  allocation  and 
optimization.  Although  the  code  requires  only  minutes  of  CPU  time,  the  production 
version  Is  expected  to  require  a  million  words  of  memory  and  will  be  classified. 
These  characteristics  together  with  the  fact  that  it  Is  written  entirely  in  FORTRAN 
make  QUANTA  a  good  candidate  for  the  minicomputer  environment. 


7 


AFWL-TR -80-20 


MINICOMPUTER  REQUIREMENTS 


GENIAL 


After  identifying  candidate  codes,  the  baseline  requirements  of  a  minicompu¬ 
ter  were  formulated.  This  was  done  using  the  following  general  guidelines: 

•  Overlaying  requirements  should  be  kept  to  a  minimum.  Therefore, 
a  large  address  space  is  necessary. 

•Floating  point  range  and  accuracy  requirements  must  be  satisfied. 

Some  tradeoffs  can  be  made  in  the  area  of  required  hardware 
performance  versus  Improved  turnaround  in  a  dedicated  or 
semi  dedicated  environment. 

HARDWARE  CAPACITY  AND  PERFORMANCE  CONSIDERATIONS 

The  following  characteristics  relating  to  capacity  or  performance  of 
various  hardware  features  were  considered: 

•  Main  memory  capacity. 

•  Main  memory  addressab  i  I  i  ty . 

•  Main  memory  bandw  i  dth . 

•  Mass  storage  capacity. 

•  Mass  storage  transfer  rate. 

•  I/O  bus  bandwidth, 

•CP  timing  (typical  or  average  add,  multiply,  and  divide  times). 
•Floating  point  accuracy  (significant  digits). 

•  Range  of  exponent. 

GENERAL  HARDWARE  REQUIREMENTS 

The  following  are  general  requirements  which  the  hardware  must  support: 

•  Large  capacity  (100  M  bytes)  removable  disk  packs  (3330/844  tech¬ 
nology)  must  be  available. 

•  The  hardware  must  support  d i rect-memory-access  I/O. 

SOFTWARE  REQUIREMENTS 

The  following  requirements  must  be  satisfied  by  the  supplied  software: 

•The  operating  system  must  support  multitasking  and  foreground/ 
background  modes  of  operation. 

•Sequential  and  direct  access  (random)  files  must  be  supported. 
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•  The  file  system  must  contain  privacy  and  protection  features. 

0  As  a  minimum,  an  optimizing  ANSI  FORTRAN  compiler  must  be  avail¬ 
able.  A  PASCAL  compiler  is  desirable. 

A  complete  mathematical  library  must  be  provided,  including 

"single  and  double  precision  trigonometric  routines. 

•File  editing  and  sort  utilities  must  be  provided. 

•  ASCII  nine  track  tape  format  must  be  supported. 

•  Graphics  software  must  be  available. 

MEMORY  REQUIREMENTS 

Given  a  minicomputer  system  which  is  otherwise  suited  to  the  application  for 
which  it  is  being  used,  lack  of  sufficient  addressable  memory  is  an  annoying  and 
restrictive  problem  with  most  machines,  in  fact,  this  problem  is  not  confined  to 
minicomputers — the  large,  scientific  computers  in  use  today  also  suffer  from  lack 
of  memory.  Several  minicomputers  are  available  today  which  have  solved  this  prob¬ 
lem,  or  at  least  eased  it  to  a  large  extent.  The  virtual  memory  systems  offer 
enough  addressable  memory  to  satisfy  even  the  most  voracious  memory  users. 

Most  minicomputer  systems  s till  restrict  the  programmer  to  a  64  K  byte 
address  space.  Due  to  the  magnitude  of  the  programming  and  conversion  problems 
which  can  be  caused  by  this  restriction,  it  was  decided  that  no  system  so  limited 
would  be  considered  a  viable  candidate  system  for  this  application.  Rather,  a 
virtual  memory  system  will  be  required.  In  addition,  a  decision  was  made  to  con¬ 
sider  only  32-bit  machines.  This  would  generally  provide  greater  speed  over  the 
16-bit  systems  and  more  compatibility  and  transportab i I i ty  over  the  22-bit  (HP3000- 
III)  and  24-bit  (Harris  570)  machines.  These  restrictions  limited  the  choices  to 
the  three  vendors  which  are  discussed  in  the  following  section. 
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SURVEY  OF  INDUSTRY  MINICOMPUTER  HARDWARE 


OFF-THE-SHELF  SYSTEMS 

After  the  requirements  for  a  minicomputer  system  were  determined,  an  Industry 
survey  was  accomplished  to  decide  if  those  requirements  could  be  satisfied. 
Initially,  the  list  of  possible  vendors  was  lengthy  and  included  such  vendors  as 
Data  General,  Hewlett-Packard,  Honeywell,  Modcomp,  Harris  and  Sperry-Univac. 

However,  the  virtual  memory  and  32-bit  architecture  constraints  reduced  the  list 
to  Prime,  IBM  and  Digital  Equipment  Corporation. 

Table  2  lists  some  of  the  important  characteristics  of  each  vendor's  candi¬ 
date  system.  Note  that  the  table  does  not  Include  any  mention  of  mass  sotrage 
devices.  Each  of  the  vendors  supports  a  disk  system  using  the  IBM  3330  technology. 
This  is  a  200M-300M  byte  disk  storage  unit  with  an  average  access  time  of  30  ns 
and  a  maximum  transfer  rate  of  1.2  MB  per  second. 

PRIME.  PRIME  Computer,  Inc.,  of  Farmingham,  Mass.,  offers  a  wide  range  of 
performance  in  its  municomputer  line,  from  the  single  user  PRIME  100  and  200  to 
the  multiuser  16-bit  PRIME,  300,  350,  400  and  500  to  the  32-bit  PRIME  550,  650 
and  750.  The  PRIME  750  is  in  the  VAX-1 1/780  and  ISM  4341  class.  Some  features 
of  PRIME  computers,  in  general,  and  of  the  PRIME  750,  in  particular,  are  discussed 
In  the  following  paragraphs. 

Centra!  Processor.  The  central  processor  uses  48-bit  instruction  words 
and  includes  hardware  implemented  multiply  and  divide  instructions  in  both  single 
and  double  precision.  The  advertised  floating  point  instruction  times  for  the  750 
are  somewhat  slower  than  the  VAX-1  1/780  times  but  should  provide  competitive  per¬ 
formance.  One  very  important  feature  of  the  CPU  Is  the  ease  with  which  the  machine 
can  be  upgraded.  For  example,  to  upgrade  from  a  PRIME  350  to  a  400  involves  replac¬ 
ing  one  board.  This  will  be  Important  in  the  future  if  an  Increase  in  computing 
power  becomes  necessary  after  PRIME  announces  a  machine  which  is  faster  than  the 
75 0. 

Memory .  The  PRIME  750  memory  system  consists  of  up  to  8M  bytes  of  M0S, 

540  ns  cycle  time,  error  correcting  memory.  This  is  augmented  by  a  I6K  bytes, 

80  ns  cache  memory  which  reduces  the  effective  cycle  time  to  149  ns.  The  memory 
is  built  using  I6K  chips  with  256K  bytes  on  a  board.  Therefore,  IM  bytes  of 
memory  occupies  just  four  vertical  inches  of  cabinet  space. 
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TABLE  2. 

SUMMARY  OF  CANDIDATE 

SYSTEMS 

PRIME  750 

VAX-1  1/780 

IBM  4341 t 

Memory: 

Capac i ty / 1 nter 1 eav I ng 

8M  bytes /2X 

8M  bytes/2X 

4M  bytes/* 

Addressability  (virtual) 

32M  bytes 

2.2G  bytes 

I6M  bytes 

Cycle  Time/Effective  (w/cache)  540  ns/ 149  ns 

600  ns/280  ns 

*/225  ns 

Bandwidth 

8M  bytes 

I3.3M  bytes 

9M  bytes 

Cache  Capac ity/Cycle  Time 

I6K  bytes/80  ns 

8K  bytes/200  ns 

8K  bytes/ 150  ns 

CPU: 

Floating  Point  Timing  (us) 

Single  Precision  + 

0.9 

0.8 

1.4 

* 

2.1 

1.2 

3.8 

/ 

* 

4.2 

6.3 

Double  Precision  + 

l.i 

1.4 

1 .4 

# 

3.2 

3.4 

5.4 

/ 

6.5 

8.8 

1  1 .0 

Extd  Precision  + 

## 

** 

3.8 

# 

** 

#* 

15.7 

/ 

*# 

## 

** 

F.P.  Accuracy 

(single/double/extended) In 

digits  6/13/** 

6/13/** 

6/13/33 

F.P.  Range 

(q4-38,  -38 

(0*>8,  -.3  8 

J0+75,  -79 

*  Value  not  available. 

**  Feature  not  available. 

t  The  IBM  4341  is  a  mainframe  which  is  treated  as  a  minicomputer  for  the  purpose 
of  this  report. 

PRIMOS  IV  Is  the  operating  system  which  supports  the  PRIME  virtual 
memory  system.  The  PRIME  750  offers  32M  bytes  of  virtual  address  space  to  each 
user  and  256M  bytes  to  be  shared  by  all  users. 

I nput/Qutput.  The  PRIME  750  supports  a  maximum  transfer  rate  of  8M  bytes 
per  second.  Up  to  2.4  billion  bytes  of  disk  storage  can  be  installed  on-line, 
including  disks  ranging  from  a  300K  byte  dual  floppy  or  3M  byte  cartridge  to  a 
300M  byte  removable  disk  pack. 

Software.  The  PRIME  750  software  includes  the  PRIMOS  IV  operating  system; 
a  BASIC  Interpreter;  FORTRAN,  COBOL,  RPG  II,  and  BASIC/VM  compilers;  and  a  macro¬ 
assembler.  The  operating  system  is  written  in  FORTRAN,  and  the  source  code  Is 
available  at  no  additional  charge.  PRIME  also  offers  networking  software  which 
allows  Interconnection  of  up  to  three  PRIME  computers  with  full  file  sharing 


AFWl-TR-80-20 


capabilities.  The  remote  job  entry  (RJE)  emulator  software  consists  of  a  single 
executive  and  HASP,  200  UT,  2780,  and  ICL  7020  emulator  modules.  The  operating 
system  and  compilers  support  shared  reentrant  code,  and  the  BASIC/VM  compiler  Is 
Itself  reentrant.  Finally,  all  applications  software  written  for  a  PRIME  com¬ 
puter  Is  upward  and  downward  compatible  throughout  the  entire  line  of  PRIME 
computers . 

DEC.  Digital  Equipment  Corporation  Is  the  leading  manufacturer  of  minicom¬ 
puter  systems.  The  VAX-1 1/780  was  announced  In  1977,  and  Is  the  only  32-bit 
minicomputer  offered  by  DEC. 

Central  Processor.  The  VAX  uses  a  microcoded  32— b { t  Instruction  set  and 
can  handle  two  classes  of  Instructions.  In  native  mode,  the  VAX  executes  instruc¬ 
tions  designed  for  the  machine;  and  in  PDP-11  mode,  the  PDP-11  instruction  set  is 
available.  The  systems  can  execute  native  programs  and  PDP-11  programs  concurrently, 
and  context  switching  is  done  in  one  instruction.  Also,  the  microcode  feature 
allows  the  user  to  implement  single  instructions  to  execute  processes  which  would 
normally  require  many  instructions  to  complete. 

Memory.  The  VAX  can  support  up  to  8M  bytes  of  MOS  memory  and  offers  4G 
bytes  of  virtual  memory.  The  8K  byte  cache  system  combines  with  main  memory  to 
provide  an  effective  cycle  time  of  280  ns.  Up  to  231  bytes  (2.2G  bytes)  of 
virtual  memory  is  available  to  each  user. 

I nput/Output.  The  VAX  can  support  a  memi.  transfer  rate  of  I3.3M  bytes 

per  second,  matching  the  internal  bus  bandwidth.  tis  rate  Is  made  possible  using 

buffered  memory  controllers.  The  operating  system  software  supports  the  high  data 
transfer  rates  by  using  the  hardware  priority  levels  to  decrease  response  time, 
and  by  overlapping  disk  data  transfer  operations  with  seek  requests. 

Software.  VAX/VMS  Is  the  virtual  memory  operating  system  written  for  the 
VAX-1  1/780.  It  supports  real-time.  Interactive,  and  batch  environments  concurrently 
or  in  any  combination.  The  operating  system  supports  FORTRAN,  COBOL,  BASIC,  RFG 
II,  BLISS,  and  MARCO  assembler  programming  languages.  Other  features  offered  by 
VAX/VMS  Include  record  management  services  (RMS)  for  general  purpose  file  and 
record  handling  capabilities,  user  access  procedures,  resource  and  performance 
statistics,  error  logging,  and  on-line  diagnostics. 

IBM.  The  IEM  4300  series,  although  It  Is  not  considered  a  minicomputer,  pos¬ 
ses  most  of  the  virtures  of  one  (small  size,  air-cooled,  low  cost  commercial  60- 
cycle  power  required).  It  will  be  considered  a  minicomputer  for  this  report. 
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Central  Processor.  The  IBM  4341  uses  a  mlcrocoded  Instruction  set.  Per¬ 
formance  tests  have  shown  the  4341  to  run  up  to  3.2  times  faster  than  the  IBM 
System/370  Model  138,  and  up  to  3.14  times  faster  than  a  System/370  Model  148 
executing  the  same  FORTRAN  job  (Ref.  I).  Extended  precision  floating  point 
arithmetic  is  standard  in  the  4341 ,  and  of fers  up  to  33  decimal  digits  of  precision. 

Memory.  The  4341  uses  8K  bytes  of  cache  memory,  2M  or  4M  bytes  of  processor 
storage  (physical  memory),  and  I6M  bytes  of  virtual  storage.  Memory  Is  accessed 
in  eight  byte  blocks.  The  effective  memory  cycle  time  typically  ranges  from  150 
to  300  ns. 

Input/Output.  The  processor  can  accomodate  three  or  six  integrated  data 
channels,  and  can  realize  a  data  transfer  rate  of  9M  bytes  per  second.  This  is 
accomplished  with  the  standard  byte  multiplexer  and  two  block  multiplexer  channels 
plus  the  three  otpional  block  multiplexer  channels.  Each  channel  can  support  up 
to  eight  external  device  controllers. 

Software.  Any  program  written  for  an  IBM  System/370  will  run  on  the  4341 
provided  it  Is  not  time-dependent;  is  not  dependent  on  configuration-unique  features 
such  as  memory  size,  specific  I/O  equipment,  or  optional  features  being  present  when 
not  Included  In  the  configuration;  is  not  dependent  on  system  facilities  such  as 
interrupts  or  certain  operation  codes  being  absent  when  they  are  Included  In  the 
configuration;  and  is  not  dependent  on  results  or  functions  which  IBM  specifies 
to  be  mode  I -dependent  or  unpredictable.  Any  program  written  for  a  System/360  will 
run  on  the  4341  provided  the  above  constraints  are  satisfied  and  provided  it  does 
not  depend  on  functions  which  differ  between  the  System/360  and  System/370  (Ref.  2). 

Operating  systems  which  support  the  IBM  4341  include  DOS/VSE,  VM/370 
Release  6,  and  OS/VSI  Release  7.  The  latter  two  operating  systems  provide  new 
functions  and  complete  support  for  the  4341. 

An  extensive  selection  of  compilers,  communications  software,  and  oper¬ 
ating  system  support  software  is  available  for  the  4341.  Compilers  Include  FORTRAN, 
COBOL,  PL/I,  and  RPG  II.  Other  software  includes  Sort/Merge,  various  communications 
access  methods,  and  data  base  management  software. 


1.  IBM  4341  Processor  Facts  Folder ,  G520-3387-0,  IBM  Corp.,  White  Plains,  New  York, 
1979. 

2.  A  Datccpro  Feature  Report ,  IBM  4300  Series,  Datapro  Research  Corp.,  Del  ran. 

New  Jersey,  1978. 
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ARRAY  PROCESSORS 

In  some  cases,  peripheral  devices  are  available  which  could  enhance  some 
aspect  of  system  performance,  but  which  are  not  manufactured  nor  supplied  by  the 
system  vendor.  If  such  a  device  Is  considered  essential,  then  any  candidate  mini¬ 
computer  system  must  be  capable  of  Interfacing  with  and  supporting  It.  One 
example  Is  an  array  processor  of  the  type  manufactured  by  Floating  Point  Systems, 
Inc.  Due  to  the  relatively  slow  floating  point  processor  speeds  of  the  current 
minicomputers  (compared  with  the  CYBER  176),  an  array  processor  could  be  required 
to  provide  sufficient  CP  power. 

The  array  processor  is  a  new  species  of  computer  which  Is  feasible  because  of 
the  development  of  a  complementary  machine  architecture  and  Instruction  set. 
Optimized  tradeoffs  In  timing  characteristics  permit  the  use  of  medium  and  large- 
scale  integrated  circuits.  Thus,  off-the-shelf  components  are  available  to  build 
a  device  that  Is  capable  of  more  than  10  to  100  ttmes  the  floating  point  speeds 
of  most  standard  computer  products  which  are  currently  available  (Ref.  3). 

A  large  number  of  general  purpose  array  processors  are  available  at  this  time, 
and  new  or  improved  versions  are  being  Introduced  which  provide  even  greater  speed. 
Following  are  the  brief  descriptions  of  some  of  these  processors. 

Floating  Point  Systems  (AP-I20B).  The  first  company  to  introduce  a 
general  purpose,  inexpensive,  floating  point  array  processor  was  Floating  Point 
Systems,  Inc.  (FPS).  The  AP-I20B  operates  in  parallel  with  the  host  computer 
and  can  be  Interfaced  to  many  different  minicomputers  and  large-scale  processors. 

It  is  a  programmable,  synchronous  pipelined  processor  consisting  of  a  number  of 
fast  registers,  floating  point  adder,  floating  point  multiplier.  Integer  address 
calculator/ Indexer,  and  memory.  Conversion  from  and  to  the  host  processor  float¬ 
ing  point  format  Is  performed  on  the  fly  to  Internal  38-bit  format  which  uses  a 
10-blt  binary  exponent  and  28-bit  two's  complement  mantissa. 

There  Is  a  large  amount  of  general  purpose  software  available  for  the 
average  FORTRAN  programmer.  Access  to  the  processor  is  accomplished  through 
standard  FORTRAN  subroutine  calls  to  a  library  of  FPS  Interface  routines.  This 
library  Is  available  for  each  processor/operating  system  combination  which  can  be 
Interfaced  to  the  array  processor,  and  Is  responsible  for  handling  all  parameter 
passing  and  data  transfers  and  for  transmitting  and  initiating  the  array  processor's 


3.  Davis,  A.  F.,  "Array  Processor,"  Industrial  Research,  1977. 
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programs.  In  addition,  software  Is  available  for  algorithm  development  and  debug 
off-line  to  the  array  processor. 

The  AP-I20B  has  a  167  ns  Instruction  cycle  time  and  memory  fast  enough 
to  keep  up.  Due  to  the  segmented  functional  unit  arch  I tecture,  a  floating  point 
add  can  be  Initiated  on  each  cycle  with  a  result  available  In  two  cycle  times 
(333  ns);  a  floating  multiply  can  be  started  each  cycle  with  a  result  available 
In  three  cycle  times  (500  ns).  Since  the  add  and  multiply  can  proceed  concurrently. 
It  is  possible  to  achieve  a  maximum  of  12  MFLOPS. 

Floating  Point  Systems  (FPS-100).  The  FPS-100  Is  a  newer  processor  which 
Is  upward  compatible  with  the  AP-I20B.  Therefore,  software  written  for  the 
AP-I20B  will  also  run  on  the  FPS-100.  Although  the  FPS-100  was  designed  to 
satisfy  real-time  requirements,  its  smaller  price  tag  and  relatively  high  speed 
operation  (up  to  8  MFLOPS)  make  it  a  viable  candidate  for  use  In  the  large  code 
p roces sing  environment. 

The  FPS-100  combines  a  priority- interrupt  structure  with  a  new 
resident  multitasking  operating  system.  Three  internal  priority  levels  and  15 
external  levels  (for  receipt  of  clock  and  I/O  device  Interrupts)  are  available 
for  real-time  program  control.  In  addition  to  a  library  of  over  250  FORTRAN- 
callable  math  routines,  a  resident  FORTRAN  compiler  is  available. 

Also  available  with  the  FPS-100  Is  a  new  programmable  General  Purpose 
Input/Output  Processor  (GPI0P)  for  interfacing  to  a  wide  variety  of  standard  or 
custom  peripheral  devices.  This  includes  the  ability  to  Interface  up  to  I . 2G 
bytes  of  on-line  data  storage  through  80M  byte  or  300M  byte  disk  options. 

Computer  Design  and  Applications  (MSP).  Computer  Design  and  Applications, 
Inc.,  (CDA)  makes  a  microprogrammab le  24-bit  array  processor.  The  Micro  Signal 
Processor  (MSP)  Is  not  nearly  as  flexible  as  the  FPS  processor,  since  all  soft¬ 
ware  is  stored  In  prom  and  cannot  be  loaded  dynamically  as  in  the  AP-I20B. 

However,  an  assembler  and  simulator  are  provided  for  program  development  on  the 
host,  and  the  processor  code  can  be  referenced  from  a  FORTRAN  program.  The  MSP 
is  capable  of  about  half  the  speed  of  the  AP-I20B. 

Computer  Signal  Processing  (MAP  300).  Computer  Signal  Processing,  Inc. 
(CSPI),  makes  the  Macro  Arithmetic  Processor  (MAP)  array  processor.  This  32 -bit 
processor  is  user  programmable  and  comes  with  an  extensive  library  of  FORTRAN 
callabie  routines  which  act  as  user  Interface  modules  on  the  host  computer. 

The  MAP  Is  designed  to  be  a  powerful  data  acquisition,  auxiliary  fast  I/O,  and 


signal  processing  device.  It  Is  also  an  arithmetic  array  processor.  The  MAP 
contains  programmable,  bidirectional  Interfaces  between  external  devices  and  MAP 
memory.  These  devices  (called  scrolls)  relieve  the  host  of  the  I/O  burden  of 
feeding  the  MAP,  and  operate  at  data  rates  up  to  36M  bytes  per  second. 

The  MAP  300  contains  a  Central  System  Processor  Unit  (CSPU)  executive 
processor  and  two  arithmetic  processors.  The  CSPU  Initiates  processing 
sequences  In  the  arithmetic  processors  and  controls  data  flow  In  the  system. 
Including  control  of  the  I/O  scrolls.  It  has  a  16-blt  fixed  point  arithmetic 
unit  with  a  125  ns  cycle  time  and  a  sufficiently  large  Instruction  repertoire  to 
accomplish  Its  control  functions. 

Each  arithmetic  processor  unit  has  completely  parallel  add  and  multiply 
functional  units,  and  this  parallelism  exists  between  the  processors.  Therefore, 
two  adds  and  two  multiplies  can  be  accomplished  simultaneously.  The  processor 
performs  32 -bit  se I f -norma  1 1  zing  computations  using  an  IBM  format  which  consists 
of  a  six-hexadecimal  digit  mantissa,  a  sign  bit,  and  a  7-bit  hexadecimal  exponent. 
Floating  point  operands  have  a  dynamic  range  of  I0+7S>~77. 

The  MAP  300  can  directly  address  up  to  256K  bytes  of  32-bit  memory 
on  each  of  three  busses.  Since  each  bus  operates  Independently,  each  of  several 
processors  can  access  a  memory  simultaneously  and  continuously  on  a  cycle  steal¬ 
ing  basis.  This  allows  for  very  high  speed  processing  situations  where  Input, 
output,  and  processing  operations  are  overlapped.  Memory  Is  available  as  500  ns 
300  ns,  or  170  ns  cycle  time  MOS. 

Computer  Signal  Processing  (MAP  6400).  The  MPA  6400  Is  another  recent 
addition  to  the  array  processor  field.  It  Interfaces  to  most  popular  16-  and 
32-bit  minicomputers  and  uses  a  64-bit  hexadecimal  floating  point  number  format 
to  provide  16  decimal  digits  of  accuracy.  It  operates  In  parallel  with  the  host 
processor  to  provide  results  at  speeds  10  to  1000  times  faster  than  a  minicom¬ 
puter  executing  similar  64-bit  calculations.  For  example,  I  second  Is  required 
to  calculate  the  product  of  two  100  x  100  real  matrixes.  Including  the  time 
required  for  all  data  fetches  and  stores  and  for  program  control,  this  computa¬ 
tion  yields  an  effective  rate  of  2  MFL0PS.  More  detailed  information  on  the 
Internal  architecture  of  the  MAP  6400  is  not  available  at  this  time. 

The  MAP  6400  has  a  resident  operating  system  and  executive  processor  to 
handle  task  sequencing  and  control  functions.  In  addition,  a  SNAP-1  1  library  of 
several  hundred  FORTRAN  callable  scientific  and  engineering  functions  Is  available. 
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CODE  CONVERSION  REQUIREMENTS 


GENERAL 

The  amount  of  effort  required  to  convert  a  particular  code  to  run  on  the  mini¬ 
computer  system  must  be  determined.  This  information  together  with  the  expected 
life  of  the  code  should  be  sufficient  to  decide  whether  or  not  the  conversion 
would  be  justified.  Any  new  algorithms  or  numerical  techniques  should  be  imple¬ 
mented  during  the  conversion  process.  The  following  discussion  addresses  several 
factors  which  must  be  considered  when  determining  conversion  requirements. 

ASSEMBLER  CODE 

Assembly  language  modules  must  be  elminated  or  converted  to  FORTRAN.  If  a 
module  was  written  to  perform  a  special  function  which  does  not  exist  on  the 
minicomputer  (call  a  PPU  routine,  attach  a  permanent  file,  request  a  tape),  then 
some  redesign  will  be  required  to  eliminate  the  code.  If  it  was  written  to 
optimize  a  function  which  could  have  been  written  in  FORTRAN,  then  the  code  must 
be  converted.  In  either  case,  assembly  language  modules  can  require  a  significant 
amount  of  time  in  the  conversion  process, 

SEGMENTATION  ANO  OVERLAYING 

Main  memory  constraints  could  be  extreme,  depending  upon  the  addressing 
capability  of  the  minicomputer.  At  this  time  only  three  types  of  minis  can  be 
identified  which  are  virtual  memory  machines,  and  which  also  satisfy  the  addi¬ 
tional  constraints  that  we  have  Imposed  such  as  32-blt  architecture  and  support 
of  a  fast  and  large  capacity  disk  unit.  These  are  the  PRIME  computers,  the  DEC 
VAX-1 1/780,  and  the  IBM  4300.  Others,  such  as  the  Perkfn-Elmer  Model  8/32  and 
Model  3200,  and  the  SEL  32/75  have  megabyte  addressability,  while  most  others 
are  restricted  to  a  64K  byte  address  space.  Megabyte  direct  addressabi I Ity 
requires  the  costly  addition  of  large  amounts  of  main  memory,  while  64K  byte 
addressability  results  In  a  restrictive  environment  in  which  code  conversion 
could  be  a  very  costly  process.  If  one  of  the  virtual  memory  machines  Is 
selected,  then  the  problem  of  conforming  to  memory  constraints  could  become  one 
of  simply  disassembling  the  segment  or  overlay  structure  to  run  In  the  larger 
address  space.  In  either  case,  memory  constraints  could  pose  some  problems  in 
conversion. 
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I NPUT /OUTPUT 

Potentially  the  most  troublesome  area  In  the  code  conversion  process  Is  that 
of  I/O.  The  transportation  of  a  large  data  base  from  one  machine  to  another  can 
be  a  major  problem.  Inconsistencies  and  conversion  problems  can  exist  In  word 
size,  character  and  numeric  data  formats,  and  file  structures.  Random  access 
files  can  be  particularly  difficult  to  transport,  since  they  must  first  be 
converted  to  sequential  format. 
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ECONOMIC  ANALYSIS 


GENERAL 

The  cost  of  purchasing  and  maintaining  a  minicomputer  system  is  a  very  Impor¬ 
tant  factor  In  determining  whether  or  not  the  minicomputer  approach  to  the  large 
code  problem  Is  feasible  and  justified.  Therefore,  It  is  Important  that  all 
costs  and  potential  cost  savings  be  Identified.  However,  due  to  the  subjective 
nature  and  uncertainty  of  costs  I ncurred  as  a  result  of  poor  turnaround  and 
losses  due  to  lost  output  or  Improperly  run  jobs  on  the  CYBER  176,  these  costs 
are  not  included  in  the  analysis.  Only  actual  computer  charges  are  considered. 

The  ALFA  code  was  chosen  from  those  Identified  in  the  Candidate  Codes 
section.  Current  costs  of  running  this  code  Include  only  actual  dollar  charges 
for  CYBER  176  time.  However,  the  total  analysis  Includes  not  only  a  comparison 
of  costs  to  run  the  ALFA  code  on  the  CYBER  176  and  on  the  minicomputer,  but  also 
the  estimated  costs  to  convert  the  code  and  document  It. 

In  addition  to  the  actual  codes,  the  benchmark  codes  should  be  Included 
in  the  cost  analysis.  Due  to  the  more  strict  definition  of  these  codes.  It  Is 
easier  to  determine  the  total  number  of  floating  point  operations  performed  and 
the  total  amount  of  I/O  Involved.  These  numbers  can  be  used  to  calculate  a  cost 
per  unit  of  CPU  and  I/O.  For  example,  the  costs  might  be  expressed  In  dollars 
per  MFLOP  and  dollars  per  MWIO.  Benchmark  codes  can  be  developed  and  run  after 
the  feasibility  of  the  proposed  minicomputer  solution  to  the  large  code  problem 
Is  accepted.  It  Is  highly  desirable  that  a  minicomputer  of  the  type  selected  for 
the  baseline  system  be  used  to  run  the  benchmark  codes  for  the  cost  comparison. 
However,  If  this  Is  not  possible,  then  the  minicomputer  costs  would  need  to  be 
estimated. 

ALFA  CYBER  176  COST  PROJECTIONS 

Table  I,  shows  that  ALFA  can  require  up  to  two  CP  hours  per  run.  If  we 
assume  a  modest  6  hours  per  week  of  useful  run  time  for  the  ALFA  code  for  40 
weeks  per  year,  and  If  we  use  a  nominal  cost  of  $800  per  CP  hour  then  we  find 
that  computer  costs  alone  would  reach  $192,000  per  year  for  just  the  ALFA  code. 

It  seems  clear  that  If  other  codes  were  Included,  much  more  substantial  costs 
would  be  evident. 
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ALFA  CODE  CONVERSION  AND  DOCUMENTATION  COSTS 

ALFA  and  the  DYNOIM  preprocessor  are  written  entirely  in  FORTRAN  and  are  not 
overlaid  or  segmented.  These  characteristics  indicate  that  the  conversion  would 
be  minimal.  However,  let  us  assume  that  six  person-months  would  be  required  to 
convert  and  fully  document  the  code.  Then  the  cost  for  this  effort  would  be 
approximately  $30,000  in  labor.  Computer  costs  should  be  minimal,  but  again  we 
assume  an  inflated  figure  of  2  hours  of  CYBER  176  time,  or  $1600. 

PRIME  750  COMPUTER  SYSTEM 

The  PRIME  750  with  2M  bytes  of  M0S,  error  correcting  memory  was  chosen  as 
the  representative  baseline  system  for  this  application.  Appropriate  mass  stor¬ 
age,  I/O  peripherals,  communications  hardware  and  software,  and  graphics  hardware 
Is  also  Included  in  the  Initial  configuration.  In  addition,  a  nine-track  tape 
unit  is  included  for  code  and  results  transportability  between  the  PRIME  and 
CYBER  machines. 

Table  3  lists  the  hardware  and  software  required  together  with  the  purchase 
and  monthly  maintenance  costs.  It  can  be  seen  that  complete  cost  recovery  can 
be  realized  in  less  than  3  years  I f  just  the  ALFA  code  is  considered,  since  the 
total  cost  of  the  hardware,  software,  3  years  maintenance,  and  ALFA  conversion 
and  documentation  Is  approximately  $540,776. 


20 


AFWL-TR-80-20 


tnQOOoinoomooin  —  o<< 

cocovoovoininmcMZzm^zzz 

r-  —  m  m  cm  —  —  — 


L. 

O 

o 

O 

o 

O 

o 

O 

o 

o 

O 

o 

O 

o 

CJ 

o 

o 

o 

o 

o 

CL 

o 

o 

o 

o 

O 

o 

o 

o 

o 

O 

o 

O 

in 

Z 

o 

in 

o 

o 

o 

o 

in 

o 

o 

o 

in 

in 

o 

CM 

in 

— 

r- 

*4 

*3* 

o 

o 

© 

% 

% 

ft 

% 

% 

* 

% 

ft 

ft 

ft 

© 

CM 

vO 

CM 

m 

Ov 

CO 

in 

CM 

m 

— 

ON 

VO 

© 

— 

CM 

m 

m 

— 

— 

CO 

o 

0) 

©  I 

LD  —  1  C 

O  —  vO  O 
S  CL—  U 
CL 

O)  3  ©  E 
C  in  O  © 

—  C  4- 
4-  l-  ©  © 

O  ©  £  >• 

©  5  t-  © 

l-  o  o 

L-  CL  H-  % 

O  L.  L. 

O  <  ©  © 

CL  — 

1-0  — 

O  W  £  O 
L_  —  CD  1- 
1-  —  4- 

©  ©  .C  C 
©  O  • 
H-  L.  *  o  E 

O  JZ  >*  © 

4—  L-  ©  "+- 
10  O  C  <0 

©  •  e  —  >* 

^  ©  g  —  © 

-O  01  4-0) 

(0  ©  —  c 
i£©x:3  — 
o  .C  o  E  -H 
o  o  ©  © 

in  o  io  l. 

*3  3  © 

«  L  |/1  0  Q. 

£  S  c  §  0 

0£0  !■  W 

®£Q 

os 

o  cm  ©  c  — 

in  ©  >*o: 

n*  %  4-  ©  £L 
>*  >*  © 

UJ  L  £  % 

5  g*  g  • 
£  22=  S 


4- 

SJ 

o 

©  *o 

4- 

TJ 

u  © 

0 

© 

c 

3 

O)  L. 

© 

0  — 

Q. 

CD 

U  3 

S 

o.  a* 

L. 

tll  _ 

O 

© 

© 

4- 

O 

l> 

4- 

CM 

vO 

C 

D 

L 

—  “* 

p 

© 

L. 

c 

■o 

© 

© 

JZ  © 

CL 

© 

3 

© 

© 

E 

E 

4- 

© 

E 

© 

© 

—  *o 

-C 

Q 

L. 

© 

■O 

*o 

5  C 

O 

“O 

Q 

0 

© 

c 

O 

TD 

0 

E 

E 

L. 

O  *  © 
©  >*  i- 
©  L.  © 
©05 
OS'© 
O  ©  l- 
i-  £  © 
CL  -C 


o 

r* 

... 

O 

© 

L_ 

•— 

o 

© 

— 

© 

L. 

L. 

E 

*o 

l- 

© 

0 

E 

® 

© 

CL 

>-4- 

•d 

L. 

0 

© 

© 

l 

0 

L- 

S- 

1 

L. 

*a 

-C 

CL 

© 

© 

© 

© 

L- 

•— 

L. 

o 

J- 

© 

L. 

CL 

© 

o 

3 

L. 

*o 

+- 

L_ 

■o 

*o 

Q. 

© 

o 

> 

© 

© 

> 

© 

U 

© 

© 

© 

JO 

Cl 

CM 

U 

© 

CL 

r* 

© 

i 

— 

© 

4- 

© 

® 

© 

\ 

® 

L. 

— 

1 

r* 

4- 

— 

© 

L_ 

•— 

o 

4- 

© 

4- 

4- 

o 

® 

o 

— 

0 

-O 

■o 

C 

© 

c 

© 

o 

£3 

0 

>* 

© 

>* 

> 

o 

C 

® 

— ■ 

H- 

© 

L. 

z 

— 

o 

c 

5 

o 

1 

© 

n 

4- 

-Q 

vO 

4- 

0 

o 

© 

< 

© 

© 

0 

^3* 

M- 

© 

>• 

— 

© 

l_ 

© 

O 

cc 

o 

a: 

JZ 

CL 

vO 

vO 

© 

s 

JD 

3 

\ 

o 

009 

© 

L. 

4- 

C 

L_ 

3 

o 

h“ 

o: 

© 

o 

u 

1 

o 

a. 

iC 

■o 

s 

S 

s 

o 

CO 

© 

> 

0 

a 

© 

a 

0 

Li_ 

s 

o 

u_ 

<2 

5 

H 

VO 

5 

h- 

< 

2 

CM 

m 

c 

© 

CM 

rn 

• 

in 

vO* 

CO* 

O' 

• 

o 

_! 

CM 

ro 

in 

«3 

r*** 

CO 

AFWL-TR-80-20 


BENCHMARK  SET 


GENERAL 

Programs  must  be  written  to  allow  the  performance  of  various  systems  to  be 
measured  and  compared.  These  programs  should  be  written  In  ANSI  FORTRAN  and 
should  contain  no  code  which  Is  dependent  upon  such  hardware  characteristics 
as  word  size.  Two  types  of  programs  should  be  developed.  One  Is  the  type  which 
measures  some  performance  characteristic  such  as  floating  point  speed  or  I/O  rate. 
We  refer  to  this  type  as  a  quantitative  benchmark.  The  second  Is  the  type  which 
simulates  the  processing  and  I/O  requirements  of  an  actual  code,  and  is  called  a 
qualitative  benchmark. 

QUANTITATIVE  BENCHMARKS 

As  a  minimum,  three  quantitative  benchmark  codes  should  be  developed  to 
measure  floating  point  speed,  floating  point  accuracy,  and  I/O  performance. 
Floating  point  speed  should  be  measured  for  scalar  and  vector  add,  multiply  and 
divide  operations  in  single  and  double  precision.  Floating  point  accuracy 
should  be  measured  in  terms  of  range  of  the  exponent  and  number  of  significant 
digits  available  in  the  mantissa.  The  I/O  performance  benchmark  should  measure 
the  number  of  bytes  per  second  which  can  be  transferred  between  memory  and  the 
disk. 

QUALITATIVE  BENCHMARKS 

A  qualitative  benchmark  measures  how  well  the  machine  executes  a  particular 
code.  A  single  performance  characteristic  such  as  I/O  rate  is  not  measured. 
Rather,  the  code  is  executed  from  start  to  finish  and  the  elapsed  time  becomes 
the  measure  of  performance  for  that  code.  Other  measures  can  be  obtained  by 
running  two  or  more  qualitative  benchmarks  in  a  multitasking  environment. 

BENCHMARKING  AN  ARRAY  FTOCESSOR 

If  the  benchmark  is  to  be  run  on  a  minicomputer  equipped  with  an  array  pro¬ 
cessor,  it  will  be  necessary  to  modify  the  CYBER  176  version  of  the  program  to 
take  advantage  of  the  vector  processing  capabilities  of  the  minicomputer.  The 
array  processor  is  treated  as  an  allocatable  peripheral  device;  it  is  assigned 
to  one  task  or  program  for  the  duration  of  execution  of  that  program.  A  library 
of  FORTRAN  callable  subroutines  is  resident  in  the  minicomputer  to  serve  as  an 
interface  between  the  calling  program  and  the  array  processor.  Calls  to  these 
subroutines  must  be  inserted  in  appropriate  places  and  substituted  for  existing 
loops  to  execute  code  as  vector  operations 
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CONCLUSIONS  AND  RECOMMENDATIONS 

This  study  has  shown  that  sufficiently  powerful  hardware  is  available  in  the 
minicomputer  and  array  processor  markets  to  provide  an  environment  for  high-speed 
computation  for  some  of  the  large  codes  now  running  at  AFWL.  These  codes  cur¬ 
rently  require  a  very  large  share  of  the  computer  resources  available  to  users 
of  the  AFWL  scientific  computer  facility.  If  they  were  converted  to  run  on  a 
minicomputer  equipped  with  a  large  amount  of  mass  storage,  high-speed  computa¬ 
tional  capability,  and  a  high  degree  of  main  memory  addressability,  they  could 
be  run  more  efficiently  with  a  resulting  decrease  in  the  computational  load  on 
the  large  computers. 

Three  possible  computers  were  recommended.  The  PRIME  750  was  chosen  as  a 
baseline  system,  and  the  resulting  economic  analysis  shows  that  the  minicomputer 
solution  to  the  large  code  problem  is  economically  justified.  Indeed,  using  only 
the  cost  of  running  the  ALFA  code  shows  that  the  minicomputer  would  pay  for 
itself  in  less  than  3  years. 

It  is  recommended  that  work  on  this  project  be  continued.  A  benchmark  set 
should  be  developed  to  demonstrate  the  computing  power  available  in  the  minicom¬ 
puter  market  and  the  cost  effectiveness  of  this  approach.  When  this  is  complete 
and  the  results  of  this  report  justified,  actions  should  be  initiated  to  purchase 
a  suitable  minicomputer  system  to  be  installed  at  the  AFWL.  The  system  should  be 
made  available  to  selected  users  who  are  responsible  to  run  those  codes  which  are 
best  suited  to  conversion  to  the  minicomputer  environment.  A  significant  reduc¬ 
tion  in  the  cost  of  running  these  codes,  as  well  as  a  reduction  in  the  work  back¬ 
log  on  the  large  machines,  would  result. 
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ABBREVIATIONS  AND  SYMBOLS 

CP  Central  processor;  the  computational  unit  of  a  computer  system 

ECS  Extended  core  storage;  an  extension  of  the  high  speed  main  memory 
on  the  CDC  6600 

G  Abbreviation  for  I0243;  1G  byte  (one  gigabyte)  is  slightly  more 

than  one  billion  bytes;  1,073,742,824 

I/O  Input /output 

K  Abbreviation  for  1024;  IK  byte  (one  kilobyte) 

ICM  Large  core  memory;  an  extension  of  the  high  speed  main  memory  on  the 
CDC  CYBER  176 

M  Abbreviation  for  I0242;  1M  byte  (one  megabyte)  is  slightly  more  than 

one  million  bytes;  1,048,576 

MFLOP  Millions  of  floating  point  operations 

MFL0PS  Millions  of  floating  point  operations  per  second 

MWIO  Millions  of  words  of  input  and  output 

ns  Nanosecond;  one-billionth  of  a  second 

RJE  Remote  job  entry;  a  means  of  entering  a  job  into  a  computer  remotely 

over  telephone  lines 
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